5  Making Figures

As you create reproducible analyses, you’ll want to incorporate data visualizations including figures (plots, graphs, images). You can make use of the code you’ve learned this semester inside Quarto documents. The new aspect is to capitalize on the appropriate code chunk options to take your visualizations to the next level.

5.1 Graphs and Plots

Code chunks that produce figures have several more chunk options that you can make use of to ensure that your figures look good. For my example here, I’m going to use the code for the popularity of Stat 184 Instructor Names from the Baby Names data set (Kaplan & Beckman, 2020).

Listing 5.1: Code Chunk for Making a Figure
```{r}
#| label: fig-namesPlot
#| fig-cap: "Popularity of the Stat 184 Instructor's First Names Over Time"
#| fig-pos: H
#| fig-height: 5
#| fig-alt: "Line graph showing popularity of four names over time."
#| aria-describedby: namesPlotLD
#| lst-label: lst-figure1
#| lst-cap: "Code Chunk for Making a Figure"
# Set Color Palette ----
psuPalette <- c("#1E407C", "#BC204B", "#3EA39E", "#E98300",
                "#999999", "#AC8DCE", "#F2665E", "#99CC00")

# Make Data Visualization ----
ggplot(
  data = subsetNames,
  mapping = aes(x = year, y = total, color = name, linetype = name)
) +
  geom_line(linewidth = 0.75) +
  theme_bw() +
  labs(
    x = "Year",
    y = "Total Number of People with Name",
    color = "Name",
    linetype = "Name"
  ) +
  theme(
    text = element_text(size = 14),
    legend.key.size = unit(1, 'cm')
  ) + 
  scale_color_manual(
    values = psuPalette
  ) +
  scale_y_continuous(
    expand = expansion(mult = 0.01)
  )
```
Figure 5.1: Popularity of the Stat 184 Instructor’s First Names Over Time
Line graph showing popularity of four names over time.
Long Description

The horizontal axis is labelled "Year" and goes from about 187 to about 2024 with labels of 1875, 1900, 1925, 1950, 1975, and 2000.

The vertical axis is labelled "Total Number of People with Name" and goes from 0 to about 22000 with labels of 0, 5000, 10000, 15000, and 20000.

There is a legend showing that color and line type are used for Name which has four levels.

  • Abby is a dark blue, solid line.
  • Edward is a red, short dashed line.
  • Neil is a teal, long dashed line.
  • Padma is a golden long dashed-long gapped line.

The plot contains four lines, one for each name. Data stops in 2013.

  • The line for Abby starts out at essentially 0 and continues flat until 1960 where there is small increase. In 1976, Abby's popularity increases and generally continues to increase until about 2004 when the popularity of Abby decreases until 2013.
  • The line for Edward starts out higher than other lines, around 2500. The popularity is consistent until 1900 when there is a sharp increase in the popularity to nearly 22000 in 1925. There is then a decrease in popularity until 1930. Edward increases in popularity again until about 1955 which is the start of steady decline in popularity until 2013.
  • The line for Neil starts around zero but starts increasing around 1912. The increase is steady until about 1952. The popularity of Neil slowly descreases from 1952 to 2013.
  • The line for Padma is the smallest line and only begins in 1968. The line is relatively flat throughout time.

Listing 5.1 shows an example code chunk for making a figure. Notice that the first option is label and that the value starts with fig-. This will allow me to reference this plot in the body of my document. The fig-cap option is what sets the caption. If you used the title argument of the labs function, you would move the text string from title to tbl-cap. There is no need to use both a caption and a title.

Similar to tables, when you render to PDF or Word (or other formats with actual pages), Quarto will occasionally move the figure around. The fig-pos option will override this just like tbl-pos does. The other two options to play around with are fig-height and fig-width. These control the height and width of the figure, respectively. The number given is interpreted as a number of inches. You’ll notice in Listing 5.1 that I only specified fig-height. I recommend that you only set one of height or width and let the computer scale the other dimension. This will help keep your figure from getting stretched/squished in weird ways if you attempt to set both.

5.2 Static Images

You can also incorporate static images into your Quarto documents. These would be visualizations that aren’t created via code in your document. To see how to include static images, please refer to the Quarto Figures page.

TipBest Practice

When using static images, it is a good idea to create a dedicated folder in your RStudio Project/GitHub repo that will hold the original image files. You can then use relative file paths in the calls to these static images.

5.3 Alt Text and Long Descriptions

NoteRequired Starting Fall 2025

You are required to include alt text (or long descriptions) for your course project as such practices make your reports more inclusive and accessible–principles at the heart of Open Science.

As we discussed in class, knowing how to add alt text to any figure, including plots and graphs, will put you ahead of the curve. As a reminder, we talked about two tools that can help you in this endeavor.

  • ASU’s Image Accessiblity AI Tool
    • Upload your image, optionally add context, and then review the generated Alt Text and Long Description.
  • The {BrailleR} Package
    • For several plots created with {ggplot2}, you can generate an initial Long Description.
ImportantAlways Review

Make sure that you review the generated Alt Text and Long Description when using these tools and adjust as necessary. These tools can and do make mistakes.

Returning to Listing 5.1, we can see two options that are for image accessibility for HTML outputs: fig-alt and aria-describedby. (They will be ignored when rendering to anything else.) When creating a web files, you should always include Alt Text on visualizations. You can learn more about Alt Text at Penn State’s Accessibility page Image ALT Text (2014).

TipBest Practice

Remember that alt text should describe to what is actually shown and only be ~140 characters long.

5.3.1 Setting the fig-alt Option Manually

The easiest approach to attaching alt text to your plot/graph is to use the fig-alt option as shown in Listing 5.1. As you’re setting the code chunk options for your plot, you can specify the fig-alt option and give a character string (make sure to use quotation marks!). Listing 5.2 provides a focused look at the key option that you will need to use for adding alt text to a graph or plot.

Listing 5.2: Example Setting Alt Text Manually
#| label: fig-diamondsBox
#| fig-cap: "Box plot of Diamond Mass"
#| fig-alt: "A box plot of diamond mass showing positive skewness with a median of 0.7 carats, streching to a max of about 5 carats."

# [Code for making a box plot of diamond masses]

Notice that the values for the keys fig-cap and fig-alt are not the same. Figure captions essentially function as a plot title. If you are using an image that someone else made, you also cite the source as part of the caption (and never part of the alt text). Your alt text needs to describe the plot in a useful, but concise manner.

CautionWrapped Text

Be careful when copying the example in Listing 5.2; I’ve told Quarto to automatically wrap code lines. Thus, the alt text does not actually consist of two lines; just one.

5.3.2 Using the alt label of {ggplot2}

Remember that the labs function from {ggplot2} has an alt argument that you can use. This will link alt text to the plot at the moment of creation. One potential benefit to using this alt argument is its ability have incorporate R variables and functions. Listing 5.3 gives an example of using the alt argument to re-create the alt text found in Listing 5.2. Using functions as part of your alt text allows for the alt text to become automatically update-able. If we were to alter the data in the diamonds data frame and then re-render our Quarto file, our alt text here would update.

Listing 5.3: Code Chunking Showing Use of the alt Argument
```{r}
#| label: fig-diamondBox
#| fig-alt: !expr get_alt_text(get_last_plot())
#| fig-height: 2
#| aria-describedby: diamondBoxLD
#| lst-label: lst-altText2
#| lst-cap: "Code Chunking Showing Use of the alt Argument"

ggplot(
  data = diamonds,
  mapping = aes(x = carat)
) +
  geom_boxplot() +
  labs(
    x = "Mass (ct)",
    alt = paste("A box plot of diamond mass showing positive skewness with a median of", median(diamonds$carat), "carats, streching to a max of about", max(diamonds$carat), "carats.")
  ) +
  theme_bw() +
  theme(
    axis.ticks.y = element_blank(),
    axis.text.y = element_blank()
  )
```
Figure 5.2
A box plot of diamond mass showing positive skewness with a median of 0.7 carats, streching to a max of about 5.01 carats.
Long Description

The hoizontal axis is labelled 'Mass (ct)' and goes from 0 to a little past 5 with labels at 0, 1, 2, 3, 4, and 5.

The box plot is arranged horizontally and shows positive skewness.

The lower whisker starts at 0.2, the value of the Sample Minimum, and extends to the first quartile, 0.4. The box is roughly symmetric, with the mid-line occuring at 0.7, the value of the Sample Median, and the third quartile at 1.04. The upper whisker extends to the upper hinge of about 2.

There are 1889 cases with masses greater than the upper hinge, getting marked as 'outsiders'. The largest of these is 5.01, the value of the Sample Maximum.

In order for us to make use of the text we’ve stored in the alt argument of labs and apply it to the fig-alt key, we have to make use of a couple of functions from {ggplot2}. First, we will need to use the get_alt_text function to extract the value of alt. Second, to give the appropriate input to this function we’ll use the get_last_plot function. This will grab the last plot that was rendered and treat it as the input we desire.

The last step is to use !expr as the start to our value for the fig-alt key. This is a YAML tag literal that will allow the knitr engine to run the two functions listed after this start.

As a second example, check out Listing 5.4, which will create a histogram of randomly sampled values from a Gaussian distribution with Expected Value of 0 and Variance of 1 (Figure 5.3).

Listing 5.4: Code Chunk Showing Extracting Alt Text
```{r}
#| label: fig-simpleHist
#| fig-cap: "Random Sample from Gaussian Distribution"
#| fig-pos: H
#| fig-height: 5
#| fig-alt: !expr get_alt_text(get_last_plot())
#| aria-describedby: simpleHistLD
#| lst-label: lst-figure2
#| lst-cap: "Code Chunk Showing Extracting Alt Text"

# Additional plot showing how to extract the alt text from the plot ----
ggplot(
  data = data.frame(values = rnorm(50, mean = 0, sd = 1)),
  mapping = aes(x = values)
) +
  geom_histogram(
    color = "black",
    fill = "blue",
    binwidth = 0.25,
    closed = "left",
    boundary = 0
  ) +
  scale_y_continuous(
    expand = expansion(mult = c(0, 0.05))
  ) +
  labs(
    x = "Observed values",
    y = "Freq.",
    alt = "Histogram of 50 randomly sampled values from a Gaussian distribution that is symmetric, bell-shaped, and centered around zero."
  ) +
  theme_bw()
```
Figure 5.3: Random Sample from Gaussian Distribution
Histogram of 50 randomly sampled values from a Gaussian distribution that is symmetric, bell-shaped, and centered around zero.
Long Description

The horizontal axis is labelled 'Observed values' and goes from -3 to 3 with labels of -3, -2, -1, 0, 1, 2 and 3.

The vertical axis is labelled 'Freq.' and goes from 0 to 8 with labels of 0, 2, 4, 6 and 8.

The bar chart has 23 vertical bars, each 0.25 units wide. The first bar starts at -3 and the last bar ends at 2.75.

The outer bars are relatively short with the bars in the middle being the tallest. The tallest bar covers -0.25 to 0 and has a height of 8.

5.3.3 Adding Long Descriptions

We also talked about using Long Descriptions with our plots/graphs. The ASU Image Accessibility tool and the {BrailleR} package can generate these descriptions for you. The best thing you can do is incorporate these into your narrative text, and use cross-references to connect the descriptions and figure. You can learn more about Long Descriptions at Penn State’s Accessibility page Long Description (2015).

TipBest Practice

Use the description generated by the {BrailleR} package or by the ASU tool as a starting point. You should refine and improve upon the text that was generated as you work on your narrative. Your narrative description should appear close to your figure and make use of cross-referencing (see Chapter 6).

5.3.3.1 Long Descriptions and HTML Documents

When we work with HTML documents, like this guide, we have a few additional options at our disposal when working with long descriptions. This includes leveraging ARIA (Accessible Rich Internet Application) roles and attributes. If you look back at Listing 5.1, Listing 5.3, and Listing 5.4, you’ll see that I used the key aria-describedby as part of the code chunk options. This sets up a connection between the plot the “Long Description” that appears under each one. These are set up as special disclosure elements where the long descriptions are housed.

This approach allows you to separate your long descriptions from your narrative text. This lets you focus your narrative on the most important aspects of the data visualization while keeping a full description of the plot.

Creating these types of elements and making sure that they are appropriately linked for screen-readers is beyond Stat 184. If you are interested in learning how to make them, contact me and I’ll show you how.

5.4 Multi-pane Visualizations

When you want people to compare data across different data visualizations, proximity is important. That is, you don’t want a person to have to flip more than a page to compare the data visualizations. One common strategy is to put the data visualizations together into one figure. In Quarto there are two ways that we can do this, both of which use a single code chunk.

ImportantPlots/Graphs or Tables, Not Both

Any code chunk you make in your Quarto document for a data visualization should either be for a plot/graph or for a table. You should not use the same chunk to house the code for both of these two types of data visualizations.

CautionSingle Tables

Generally speaking, you should only put one table in a code chunk. This is one way in which tables are different from plots/graphs. If you need to put several tables next to or near each other, just put their individual code chunks near one another and use the tbl-pos key with the value H.

5.4.1 Faceting

The first approach that we can use to create multi-pane plots/graphs, is through the power of faceting. This is the idea of making a set of “small multiples” of the same style of graph. The {ggplot2} package provides us with two functions that help us achieve this with a single ggplot call:

  • facet_wrap: Create a ribbon of plots that flows from one row to the next, with each plot being a column.
  • facet_grid: Create a rectangular display of plots.

Figure 5.4 provides a way to think about how these two functions end up arranging the plots. We can think about the grid layout as working like an Excel spreadsheet: the columns specify one set of groupings while rows specify a second set of groupings.

Figure 5.4: Grid vs. Wrap Layout (Wickham et al., n.d.)

This is different from wrapping, which moves sequentially left-to-right through your specified number of columns before moving down to the next row.

Another way you can think about the difference between these two faceting functions to ask yourself how many case attributes you want to use to create small multiples. If you only want to use a single attribute, use facet_wrap; if you want to use two, use facet_grid.

5.4.1.1 Alt Text and Long Descriptions

When faceting, you will need to be careful with your alt text and long descriptions. You only get ONE alt text for the entire faceted set of plots. Thus, your alt text is going to more general than you might otherwise use. Thus, your long description is going to have to be more robust, covering every facet of your multi-pane plot/graph.

5.4.2 Sub-Figures

Alternatively, suppose that you want to put different types of plots next to each other AND make sure that each one gets an appropriate label. This is the idea of creating sub-figures and is different from faceting. Faceting is essentially one plot that is broken up into several smaller pieces; sub-figures are entirely separate plots that are just bound together into one figure environment.

When creating a figure composed of several sub-figures, you need to keep in mind that you will have

  • One code chunk for the entire set of sub-figures.
  • One label with the fig- prefix for the entire set.
  • One Caption covering the entire set (i.e., fig-cap).
  • Multiple sub-captions, one for each sub-figure.
  • Multiple alt text, one for each sub-figure.
  • Thought about how many columns (or rows) you want to use to arrange the sub-figures.
  • And the code to create each of the plots that make up all of the sub-figures.

With these in mind, we can go about creating a figure composed of sub-figures. Listing 5.5 shows the first part of the single code chunk that we’ll use; in particular the code chunk options. Notice that we only have one label and one fig-cap. Under the fig-subcap key, we’ll use YAML nesting with the en-dash (i.e., -) to denote the sub-captions to be applied to each sub-figure. For alt text, we’ll use the same strategy of YAML nesting. The layout-ncol key allows us to specify how many columns we want to use. Alternatively, you can use layout-nrow to specify the number of rows. I would not recommend using both. If you want even more control, you can use the layout; see Custom Layout for an example.

Listing 5.5: Code Chunk Showing Creating Sub-Figures
#| label: fig-subFigures
#| fig-cap: "Random Samples of Size 50 from Different Distributions"
#| fig-subcap: 
#|   - "Gaussian, $\\mu=0$, $\\sigma^2=1$"
#|   - "Exponential, $\\lambda=1$"
#|   - "Beta, $\\alpha=1$, $\\beta=3$"
#| fig-alt:
#|   - "Histogram of 50 randomly sampled values from a Gaussian distribution that is symmetric, bell-shaped, and centered around zero."
#|   - "Histogram of 50 randomly sampled values from a Exponential distribution, showing positive skewness."
#|   - "Histogram of 50 randomly sampled values from a Beta distribution with slight positive skewness."
#| layout-ncol: 2
#| aria-describedby: subFiguresLD

# Creating a Figure with Sub-figures ----
## Code for the first sub-figure ----
# [Code omitted]
Figure 5.5: Random Samples of Size 50 from Different Distributions
(a) Gaussian, \(\mu=0\), \(\sigma^2=1\)
Histogram of 50 randomly sampled values from a Gaussian distribution that is symmetric, bell-shaped, and centered around zero.
(b) Exponential, \(\lambda=1\)
Histogram of 50 randomly sampled values from a Exponential distribution, showing positive skewness.
(c) Beta, \(\alpha=1\), \(\beta=3\)
Histogram of 50 randomly sampled values from a Beta distribution with slight positive skewness.
Long Description

This is a set of three plots of 50 randomly sampled values from three different distributions, in a 2 by 2 grid. The distributions include a Gaussian, an Exponential, and a Beta. Each plot will be described in order left-to-right, top-to-bottom.

In the upper left is a Gaussian distribution with an Expected Value of 0 and Variance of 1.

The horizontal axis is labelled 'Observed values' and goes from about -3 to 2.25 with labels of -2, -1, 0, 1, and 2.

The vertical axis is labelled 'Freq.' and goes from 0 to 7 with labels of 0, 2, 4, and 6.

The bar chart has 16 vertical bars, each 0.25 units wide. The first bar starts at -2.75 and the last bar ends at 2.

The outer bars are relatively short with the bars in the middle being the tallest. The tallest bar covers 0 to 0.25 and has a height of 7.

In the upper right is an Exponential distribution with an Expected Value of 1.

The horizontal axis is labelled 'Observed values' and goes from about 0 to 4.25 with labels of 0, 1, 2, 3, and 4.

The vertical axis is labelled 'Freq.' and goes from 0 to 11.5 with labels of 0, 3, 6, and 9.

The bar chart has 14 vertical bars, each 0.25 units wide. The first bar starts at 0 and the last bar ends at 4.

The tallest bars are to the left (towards 0) with the bars getting shorter as they approach 4 The tallest bar is for the interval 0 to 0.25 and has a height of 11.

In the lower left is a Beta distribution with an Alpha value of 1 and Beta value of 3.

The horizontal axis is labelled 'Observed values' and goes from about 0 to 0.75 with labels of 0, 0.2, 0.4, and 0.6.

The vertical axis is labelled 'Freq.' and goes from 0 to 17.5 with labels of 0, 5, 10, and 15.

The bar chart has 7 vertical bars, each 0.1 units wide. The first bar starts at 0 and the last bar ends at 0.7.

The tallest bars are to the left (towards 0) with the bars getting shorter as they approach 0.75. The tallest bar is for the interval 0 to 0.1 and has a height of 17.

When you are creating sub-figures, you must keep the ordering of the sub-figures in mind. Your code, sub-captions, and alt text need to be in the same order, top-to-bottom. The rendered plots will get placed left-to-right to fill up a each row before moving down to the next row.

I highly recommend that you use the above approach to using sub-figures as this will create cross-referencable sub-figures in a reproducible manner.

5.4.2.1 Alt Text and Long Descriptions

Unlike faceting, each sub-figure can have their own alt text. However, the alt text needs to be set manually (see Section 5.3.1). Keep in mind the best practices for alt text.

For long descriptions, we follow the same process as for a single plot. The only catch here is that you now have more plots all in the same space. You’ll have to be careful with your descriptions to make sure that they match up with the appropriate plot.